12 research outputs found

    Tasks Fairness Scheduler for GPU

    Get PDF
    Nowadays GPU clusters are available in almost every data processing center. Their GPUs are typically shared by different applications that might have different processing needs and/or different levels of priority. As current GPUs do not support hardware-based preemption mechanisms, it is not possible to ensure the required Quality of Service (QoS) when application kernels are offloaded to devices. In this work, we present an efficient software preemption mechanism with low overhead that evicts and relaunches GPU kernels to provide support to different preemptive scheduling policies. We also propose a new fairness-based scheduler named Fair and Responsive Scheduler, (FRS), that takes into account the current value of the kernels slowdown to both select the new kernel to be launched and establish the time interval it is going to run (quantum).Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    A Framework For TV Logos Learning Using Linear Inverse Diffusion Filters For Noise Removal

    Get PDF
    Different logotypes represent significant cues for video annotations. A combination of temporal and spatial segmentation methods can be used for logo extraction from various video contents. To achieve this segmentation, pixels with low variation of intensity over time are detected. Static backgrounds can become spurious parts of these logos. This paper offers a new way to use several segmentations of logos to learn new logo models from which noise has been removed. First, we group segmented logos of similar appearances into different clusters. Then, a model is learned for each cluster that has a minimum number of members. This is done by applying a linear inverse diffusion filter to all logos in each cluster. Our experiments demonstrate that this filter removes most of the noise that was added to the logo during segmentation and it successfully copes with misclassified logos that have been wrongly added to a cluster

    Evaluation of CNN architectures for gait recognition based on optical flow maps

    Get PDF
    This work targets people identification in video based on the way they walk (\ie gait) by using deep learning architectures. We explore the use of convolutional neural networks (CNN) for learning high-level descriptors from low-level motion features (\ie optical flow components). The low number of training samples for each subject and the use of a test set containing subjects different from the training ones makes the search of a good CNN architecture a challenging task.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    CUVLE: Variable-Length Encoding on CUDA

    Get PDF
    Data compression is the process of representing information in a compact form, in order to reduce the storage requirements and, hence, communication bandwidth. It has been one of the critical enabling technologies for the ongoing digital multimedia revolution for decades. In the variable-length encoding (VLE) compression method, most frequently occurring symbols are replaced by codes with shorter lengths. As it is a common strategy in many compression applications, efficient parallel implementations of VLE are very desirable. In this paper we present CUVLE, a GPU implementation of VLE on CUDA. Our approach is on average more than 20 and 2 times faster than the corresponding CPU serial implementation and the only known state-of-the-art GPU implementation, respectively.Junta de Andalucía, TIC-1692. Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tec

    A Hybrid Piece-Wise Slowdown Model for Concurrent Kernel Execution on GPU

    Get PDF
    Current execution of kernels on GPUs allows improving the use of hardware resources and reducing the execution time of co-executed kernels. In addition, efficient kernel-oriented scheduling policies pursuing criteria based on fairness or Quality of Service can be implemented. However, achieved co-executing performance strongly depends on how GPU resources are partitioned between kernels. Thus, precise slowdown models that predict accurate co-execution performance must be used to fulfill scheduling policy requirements. Most recent slowdown models work with Spatial Multitask (SMT) partitioning, where Stream Multiprocessors (SMs) are distributed among tasks. In this work, we show that Simultaneous Multikernel (SMK) partitioning, where kernels share the SMs, obtains better performance. However, kernel interference in SMK occurs not only in global memory, as in the SMT case, but also within the SM, leading to high prediction errors. Here, we propose a modification of a previous state-of-the-art slowdown model to reduce median prediction error from 27.92% to 9.50%. Moreover, this new slowdown model is used to implement a scheduling policy that improves fairness by 1.41x on average compared to even partitioning, whereas previous models reach only 1.21x on average.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech P18-FR-3130 UMA20-FEDERJA-059 PID2019-105396RB-I0

    Gait recognition and fall detection with inertial sensors

    Get PDF
    In contrast to visual information that is recorded by cameras placed somewhere, inertial information can be obtained from mobile phones that are commonly used in daily life. We present in this talk a general deep learning approach for gait and soft biometrics (age and gender) recognition. Moreover, we also study the use of gait information to detect actions during walking, specifically, fall detection. We perform a thorough experimental evaluation of the proposed approach on different datasets: OU-ISIR Biometric Database, DFNAPAS, SisFall, UniMiB-SHAR and ASLH. The experimental results show that inertial information can be used for gait recognition and fall detection with state-of-the-art results.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    A weakly-supervised approach for discovering common objects in airport video surveillance footage

    Get PDF
    Object detection in video is a relevant task in computer vision. Standard and current detectors are typically trained in a strongly supervised way, what requires a huge amount of labelled data. In contrast, in this paper we focus on object discovery in video sequences by using sets of unlabelled data. Thus, we present an approach based on the use of two region proposal algorithms (a pretrained Region Proposal Network and an Optical Flow Proposal) to produce regions of interest that will be grouped using a clustering algorithm. Therefore, our system does not require the collaboration of a human except for assigning human understandable labels to the discovered clusters. We evaluate our approach in a set of videos recorded at the outdoor area of an airport where the aeroplanes park to load passengers and luggage (apron area). Our experimental results suggest that the use of an unsupervised approach is valid for automatic object discovery in video sequences, obtaining a CorLoc of 86.8 and a mAP of 0.374 compared to a CorLoc of 70.4 and mAP of 0.683 achieved by a supervised Faster R-CNN trained and tested on the same dataset.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Efficient OpenCL-based concurrent tasks offloading on accelerators

    Get PDF
    Current heterogeneous platforms with CPUs and accelerators have the ability to launch several independent tasks simultaneously, in order to exploit concurrency among them. These tasks typically consist of data transfer commands and kernel computation commands. In this paper we develop a runtime approach to optimize the concurrency between data transfers and kernel computation commands in a multithreaded scenario where each CPU thread offloads tasks to the accelerator. It deploys a heuristic based on a temporal execution model for concurrent tasks. It is able to establish a near-optimal task execution order that significantly reduces the total execution time, including data transfers. Our approach has been evaluated employing five different benchmarks composed of dominant kernel and dominant transfer real tasks. In these experiments our heuristic achieves speedups up to 1.5x in AMD R9 and NVIDIA K20c accelerators and 1.3x in an Intel Xeon Phi (KNC) device.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Gait recognition applying Incremental learning

    Get PDF
    when new knowledge needs to be included in a classifier, the model is retrained from scratch using a huge training set that contains all available information of both old and new knowledge. However, in this talk, we present a way to include new information in a previously trained model without training from scratch and using a small subset of old data. We perform a thorough experimental evaluation of the proposed approach on two image classification datasets: CIFAR-100 and ImageNet. The experiment results show that it is possible to include new knowledge in a model without forgetting the previous one, although, the performance is still lower than training from scratch with the complete training set.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech

    Low-textured regions detection for improving stereoscopy algorithms

    Get PDF
    The main goal of stereoscopy algorithms is the calculation of the disparity map between two frames corresponding to the same scene, and captured simultaneously by two different cameras. The different position (disparity) where common scene points are projected in both camera sensors can be used to calculate the depth of the scene point. Many algorithms calculate the disparity of corresponding points in both frames relying on the existence of similar textured areas around the pixels to be analyzed. Unfortunately, real images present large areas with low texture, which hinder the calculation of the disparity map. In this paper we present a method that employs a set of local textures to build a classifier that is able to select reliable pixels where the disparity can be accurately calculated, improving the precision of the scene map obtained by the stereoscopic technique.Universidad de Málaga. Campus de Excelencia Internacional Andalucía Tech. Ministry of Education and Science of Spain under contract TIN2010-16144 and Junta de Andalucía under contract TIC-1692
    corecore